Adaptive auditory pattern recognition system for driver training and testing equipment

ABSTRACT

A driver trainer simulator involving a movie film for portraying a series of driving situations projected on a screen in view of a driver station on a driver trainer unit, the movie film being encoded with electrical signals which correspond to successive driving situations on the film and are compared with signals received from the driving controls of the driver trainer upon being operated by a student, an instantaneous display panel for indicating the correctness of the response by the student to each of the successive driving situations and a permanent magneticrecording means for storing the comparative information. An adaptive stressor unit having an audio unit may be simultaneously used to provide secondary auditory perceptual loading on the student. An auditory pattern recognition device including an audio pickup may be provided to receive the audible responses from the student in response to the sound patterns from the stressor unit. An indicator is provided for indicating correct and incorrect audible responses as compared with the audible responses from the audio unit of the stressor unit. The recognition device indicator may be coupled to the stressor unit to control the rate of loading of the auditory signals on the student.

United States Patent 1 3,591,931

[72] Inventor D n 56 3,266,173 8/1966 Sheridan 35/11 Allies, Iowa3,266,174 8/1966 Bechtol et al. 35/1 1 [21] Appl- 8 3,523,374 8/1970Schuster 35/1 1 22 Filed Oct. 30, 1969 I Patemcd July 13 1971 PrimaryExaminer-Wm. H. Gneb I73] Assignee Iowa State University ResearchFoundation Atwmey zarley* McKee ThOmte Ames,1owa

Continuation-impart of application Ser. No. 655,045. y 1967 now pawn No.ABSTRACT. A driver trainer simulator involving a movie film 3 523 374for portraying a series of driving situations projected on a screen inview ofa driver station on a driver trainer unit, the

movie film being encoded with electrical signals which cor- [541ADAPTIVE AUDI-TORY PATTERN RECOGNITION respond to successive drivingsituations on the film and are SYSTEM FOR DRWER TRMMNG AND TESTINGcompared with signals received from the driving controls of EQUIPMENTthe driver trainer upon being operated by a student, an instan- 9Cljnmwing Figs taneous display panel for indicating the correctness of theresponse by the student to each of the successive driving situa- U-S. id a permanent g ti di g means f t i g 35/22 the comparative information.An adaptive stressor unit having [51 Int. Cl G091) 9/02 an audio i may bSimultaneously used to provide Seconda [50] Field of Search 35/1 1,22

ry auditory perceptual loading on the student. An auditory patternrecognition device including an audio pickup may be [56] References cuedprovided to receive the'audible responses from the student in UNITEDSTATES PATENTS response to the sound patterns from the stressor unit. Anin- 2,370,543 5 C st 35/1 I dicator is provided for indicating correctand incorrect audi- 3,015,169 1/1962 Chedister 35/11 ble responses ascompared with the audible responses from 3,108,384 10/1963 Jazbutis etal. 35/1 1 the audio unit of the stressor unit. The recognition devicein- 3,l86,1 l0 6/ 1965 Smyth 35/1 1 dicator may be coupled to thestressor unit to control the rate 3,251,142 5/1966 Jazbutis 35/1 1 ofloading of the auditory signals on the student.

Pea order Z/ zone/f2 fins/$1554 PATENTED JUL 1 3:91:

SHEET 1 [IF 4 PATENTED JUL 1 3:97:

SHEET 3 OF 4 PATENIEU JUU 3 \sn SHEET U (1F 4 0 m gm sm H E W M W wwmtion;

ADAPTIVE AUDITORY PATTERN RECOGNITION SYSTEM FORDRIVER TRAINING ANDTESTING EQUIPMENT This is a Continuation-ln-Part application of mycopending application Ser. No. 655,045 filed July 2]. 1967. now US. Pat.No. 3,523,374, issued Aug. ll, I970.

Numerous studies have been made which indicate that neither grouptherapy nor special training classes are effective in causing problemdrivers to improve in their driving skills more than the improvementresulting from those left alone for the same period of time. Grouppsychotherapy for problem drivers has also been used and provenineffective. The results and conclusions from extensive experimentationand studies indicate that group psychotherapy, group driver improvementmeetings, and special training for problem drivers are not effective inassisting problem drivers to improve their subsequent violation andaccident records, compared to a control group. This invention isdirected to what is believed to be an effective training equipment forproblem drivers. This invention combines realistic driver-training filmswith auditory shadowing which requires the subject to recognize andrepeat spoken digits. Problem drivers will be induced to change theirdriving habits on the basis of undergoing operant training via realisticsimulator and training movies as well as by the inclusion of spokendigit tracking as stress.

The theory of the use of this equipment is that the training grouputilizing operant conditioning learning to cope with accident situationswill utilize driver simulators to respond to potential accident scenesin films. The drivers thus consciously should learn to avoid theaccident situation 100 percent of the time. However, the problem withproblem drivers is that they know consciously the correct drivingbehavior to avoid accidents and moving violations, but they do notmanifest this behavior while driving. Therefore, this is corrected byhaving the students learn to respond to driving situations while theyare stressed simultaneously. Thus the problem drivers will have learnedhow to react safely and subconsciously to accident driving situations inreal life without hav-- ing to stop and think about it, and in spite ofa retained problem driving attitude.

Drivers will have the usual mechanical aspects of the car in thesimulator and in addition feedback devices will be employed. For everysafe response made to a potential accident situation shown in a drivingscene the driver will receive a green light to signify correct behavior.If the driver responds with some inappropriate response or too late, alegend panel will be lighted indicating the correct response that thedriver should have taken to avoid the accident. Response scores will berecorded for each driver, and each driver will go through the drivingfilms repetitively until he achieves the score of 99 percent correctresponses on all of the potential accident scenes. A level of stress isadded to that imposed by reacting correctly to potential accidentsituations depicted on the motion picture film. The controllable stresswill be secondary perceptual loading. Each driver will be presented witha random auditory string of spoken zeros and ones. As soon as he hears adigit, the driver must simply repeat it aloud. A simple auditory patternrecognition device is used to determine whether the driver respondedwith a zero or a one, and whether he responded correctly within theappropriate time interval. The perceptual loading is adjusted for eachsubject by changing the rate of presenting the spoken numerals so thatthe accuracy of the secondary task automatically is kept at a level of90 percent.

These and other features and advantages .of this invention will becomereadily apparent to those skilled in the art upon reference to thefollowing description when taken into consideration with theaccompanying drawings, wherein:

FIG. I is a general functional diagram of the present inven- FIG. 2 is afunctional diagram of the adaptive stressor data flow circuitry;

FIG. 3 is a functional diagram of the digit-tracking adaptive stressordetails;

FIG. 4 is a functional diagram of the adaptive auditory patgenerally bythe reference numeral 10 and includes a seat 12:

The usual driving controls are located on the front panel 14 of thesimulator 10. A single earphone headset 16 is provided for the traineewhich although not shown will sit in the seat 12.

A movie film 18 that goes with the conventional driver simulator is usedas a basic information storage'element. On this 16 mm. film are theconventional scenes from driver training studies. At one side of thefilm images is the conventional sound track 20. On the other side of thefilm will be an added ferrite strip 22 to recordthe necessary commandsfor this particular driver simulator.

' On this ferrite strip 22 possible commands are encoded. The particularcommands are detail in FIG. 20 of my copending application. The commandsare detected as tones on the ferrite strip as it runs through the movieprojector and are translated into the appropriate command per channel.The presence or absence of the tone is taken to be the presence orabsence of the command for that particular channel. The tone frequenciesare decoded in the channel command decoder as a logical bit, thatis azero or one.

The 22nd channel at a nominal frequency of 5,600 cycles per second, orHertz, is used to control the frequency of the projector. This frequencyin the 22nd channel is indicative of how fast the projector film isrunning at that particular instant. This frequency, indicating projectorspeed, is compared in a frequency discriminator with the commandfrequency from a reference oscillator whose frequency is fixed stably at5,600 Hertz. The difference is the error which-controls the speed unit;this in turn controls'the projector motor such that the speed of thefilm is controlled to be constant within very narrow frequency-speedlimits.

What the student does in response to the driver training film and itscommands is encoded by several devices on the student console. Theinformation for those encoding details is shown in my copendingapplication. The motions or position information are sensed about thestudents responses as to brake, clutch, gas or accelerator pedal,steering wheel, turn signal, miscellaneous switch operation, and glancedirection.

The student wears a headset 16 for adaptive stressing while in thestudent driver console. This headset presents information in a headphoneto one car, while the other ear is free or uncovered to listen to thesound track from the driver trainer film. A microphone 24 is part of theheadset to detect the students verbal response to tracking digitcommands. Also a part of the headset are two accelerometers to senseglance motions of the head. The earphone and microphone are part of thedigit tracking adaptive stressor loop, presented in detail in F lGS. 2and 3.

The responses of a student are sensed at his student console andcompared with the decoded commands from the driver trainer film in thecomparator logic block. The errors or differences between command andstudent response are recorded on both a paper strip recorder as well ason a magnetic tape recorder for later computer analysis. The errors arealso displayed as feedback information or commands to the student on hisconsole display panel. Details on the display panel and on thecommand-response comparisons and their display are presented in mycopending application.

DIGlT-TRAC KING ADAPTIVE STRESSOR LOOP Studies have shown that requiringa person to repeat a string of digits that he hears is stressful interms of disrupting learned behavior. Following digits or digit-trackingin this case, means that the student has to divert some of his attentionunits from his primary task of driung the simulator and to following theauditory string of digits. The rate at which he follows digits here is afunction of how correct the students are, this in turn depends on howmuch attention he devotes (or not) to his primary task of driving thesimulator. Between these two tasks, driving the simulator and followingdigits, 1 will occupy 100 percent of the student's attention.

The overall operation ofthe digit-tracking adaptive stressor asindicated in FIGS. 2-4 is as follows. An endless stereo tape belt 30 isused as the memory for the spoken zeros and ones digits for the studentto track. The upper channel on the stereo tape belt has a continuousseries ofones" spoken on it at the rate of two digits every second. Thebottom channel of this tape belt has a similar string of spoken "zeros"recorded on it. Thus one can get a spoken zero or one merely byselecting the upper channel for ones or the lower channel for zeros. Theselection of the desired digit is done by the pseudorandom numbergenerator 31. The rate at which the digits are picked off the tape setsthe rate at which the spoken digits are presented in the earphone 16 tothe student. The rate at which spoken digits are presented is controlledby the comparator 32, which sets the word rate according to howaccurately the student is following the random string of spoken ones andzeros.

ADAPTIVE AUDITORY PATTERN RECOGNITION SYSTEM Auditory patternrecognition techniques are used to determine whether the subject said,zero or one," or some other number (an error). The microphone 24 on thesubjects headset detects the spoken digits and converts them into thecor responding electronic waveforms. The audio compander insures thatthe volume level feeding the pattern recognition circuits is constant.The auditory pattern recognition scheme utilizes three differentcharacteristics of spoken speech in combination to determine whether azero or a one or something else was said by the subject. The phonemicanalysis of the spoken digits and l are explained in detail below andthe electronic implementation is given in FIGS. 3 and 4 for the patternrecognition.

The output of the three speech characteristics circuits are each codedas one or zero. that is the output of the voicing circuit (voweldetector) could be a logical one" indicating a spoken one also or itcould be a logical "zero" indicating a spoken zero for consistencypurposes. The energy word length detector and the consonant decisioncircuitry (consonant frequency) operate likewise. A majority votinglogic is used to insure reliability. The majority voting'logic thusrequires a two out of three or a three out of three vote from the threecircuits for reliable recognition. If an accuracy level of percent foreach circuit individually is assumed, then operating collectively thecircuits should be right 99 percent of the time or 99.9 percent of thetime when three out of three circuits agree. Using the two out of threedecision-making logic thus means that an error would be made only onceout of 100 times, when two out of the three decisions are wrong and theminority vote was correct. The output of the pattern recognitioncircuitry, the majority logic output, is a logical zero or one,corresponding to a spoken zero or one. This is compared with the commandword and used to adjust the rate ofpresenting digits to the subject.

The word rate adjusting feature works as follows: if the student iscorrect, the rate of selecting and presenting words to the subjectincreases slightly with each successive correct following of digits. Ifthe word recognized is wrong, opposite to that presented to the subject,the rate will decrease abruptly down to some minimum level of presentingdigits to the subject. A possible minimum level is one digit every 3 to5 seconds and a possible maximum level could be two digits per second.

The student console responses, errors and digit rate are recorded. Apaper strip recorder is used for feedback to the instructor and studentanalysis later. A magnetic tape recorder for feeding the data into acomputer for subsequent statistical analysis is also used.

The unique features of the adaptive auditory pattern recognition systemare the ones that enable the circuitry to adapt to individualcharacteristics of a speaker that are specific to him and that vary fromspeaker to speaker. Specifically, the three characteristics of speechthat are used in this simple auditory pattern recognition are the energyor length of the spoken word, the frequency characteristics of initialconsonants, high or low, and lastly the frequency characteristics of theinitial vowel, whether low or high frequency. Further this circuitry isrestricted to discriminating between the spoken digits zero" and one ordeciding that neither of these digits had been spoken.

SIGNATURE ANALYSIS OF ZERO" AND ONE FIG. 1 is the block or systemdiagram and gives the overall data flow for the individual adaptivecircuits of the digit recognizer 39 (FIG. 2).

At the top of FIG. 4 is the data flow 40 that is common for all threeadaptive circuits. The output of a microphone 24 is amplified and heldto a constant level by the automatic gain control (AGC) circuitry 42.The constant amplitude output goes to the three adaptive circuits 44, 46and 48, as well as to word and vowel detectors 50 and 52. Thesedetectors have been described in my copending application. Briefly, theword detector 50 determines when a word has started and later when ithas stopped on the basis of the sound energys exceeding a certainthreshold level. The vowel detector 52 determines when the vowel in thefirst syllable of a spoken word has started and when it stopped. Thevowel detector 52 ignores the preliminary consonantal sounds and detectsonly the vowel sounds by virture of the relatively constant frequencyenergy of a vowel as compared to a consonant.

The adaptive energy word length decision circuit 44 is shown in FIG. 4.The constant amplitude audio frequency energy goes to the word lengthcircuit 44. The word length circuit includes a flip-flop or bistablemultivibrator 54 that turns on whenever a word is spoken and turns offagain when the word is finished. This also is known as a Schmitttrigger, operating around a certain preset voltage level. The wordlength output pulse has a constant amplitude but its length varies withthe length of the spoken digit. The integrator-decider circuit 56 putsout a pulse whenever the word length pulse has exceeded a minimum timeduration. The output ofthe integrator-decider 56 goes to a memoryflip-flop 58 which remembers whether the integrator-decider 56 put out apulse corresponding to the long word zero or whether the integrator 56had put out no pulse in response to the short word "one.

The output of the word length memory flip-flop 58 goes to majority logicdecision-making pattern recognition circuitry as shown in FIG. 5, aswell as to the length decision averager 60. The averager 60 keeps trackof how many decisions of each type have been put out by theintegrator-decider 56. The averager circuit 60 is enabled by the worddetector 50, and over approximately a 1 minute period of time averagesthe word and the integrator-decider 56 pulses. If the string of spokendigits is 50 percent zeros, the averager 60 does not affect theintegrator 56 bias. If the average is not 50 percent zeros, then acorrection bias is applied to the integrator 56 such that the minimumtime is lengthened or shortened appropriately to bring the percentage ofzero decisions back to nearly 50 percent.

FIG. 7 gives circuit details of the adaptive word length decider circuit44. This circuit has a unijunction multivibrator 62 with an RCintegration circuit 63 receiving the output 55 of the word lengthflip-flop 54 and in turn feeding the emitter 64. Whenever the emitter 64to base 66 l voltage becomes sufficient, the unijunction oscillator 62puts out a negative pulse. This pulse is lengthened and stored in thememory flip-flop 58. The word pulse feeds in with a gain of-I and theoperational amplifier 60 operating as an integrator, whereas the memoryflip-flop 58 pulse 57 enters enters with a gain of +2. If exactly halfof the words are long ones (zeros), the two inputs will neutralize eachother overtime, and the output 67 of the averaging integrator will bezero; it will apply zero correction bias to the unijunctiondecider-integrator 56.

However, if the speaker had been talking fast, such that the circuit didnot recognize that some of the fastly spoken zeros were actually zeros,then the word length integrator 56 would have more one" decisions thanzero decisions. The fewerthan-norrnal zero decisions would produce onlya few pulses 57 from the memory flip-flop 58 to feed to theintegratoraverager 60. However, the same number of word pulses 61 wouldbe feeding this integrator 60 as before. The negative word pulses 61would be feeding this integrator 60 as before. The negative word pulses61 would result in a positive output 67 of the integratoraverager 60.This positive output 67 after approximately 1 minute would be fed backas a correction bias 67 to the unijunction oscillator 62 and increaseits bias. Then the word length pulse 61 would not have to be quite aslong as before to make unijunction oscillator 62 fire and get more zero"decisions. Thus the circuit automatically adjusts its decision pointsuch that in the long run, the word length decider circuit 44 decidesthat half of the spoken words on the basis of word length are zeros" andthe remainder are ones. A similar situation exists for the correctionbias 67 when more zeros" are recognized than are actually spoken. Thecorrection bias 67 would be negative and would require that the wordlength pulse 61 be longer than previously to result in a decision ofzero. This would compensate for the person who is speaking more slowlythan average.

The adaptive circuitry 46 for the initial consonant decision is shown inFIG. 4. Highand low-pass RC filters 70 and 72 are used to separate theenergy bands of interest in the spoken ones" and zeros." The amounts ofhigh frequency energy (a spoken zero") and low frequency energy (aspoken one") are detected by the appropriate envelope detectors 74 and75. These are diode detector circuits 74A and 76A in FIG. 8 whoseoutputs feed a differential amplifier 78 as a comparator. The output ofthe comparator 78 is positive or negative depending on whether a highfrequency consonant or low frequency consonant has been spoken.

The output 80 of the comparator 78 is sampled during a restricted periodof time The time period of interest for the initial consonant decisionis the time between when the spoken digit starts, as determined by thestart of the word detector pulse 82, and the start of the vowel in thefirst syllable, as determined by the start of the vowel pulse 84. Thesampling here is started by the word pulse and stopped by the vowelpulse. The complement of the vowel pulse 84 is actually used to enablethe duration of the time between the start of the word and the start ofthe vowel in the first syllable. Thus the output of the differentialamplifier comparator 78 is sampled only during the initial consonant ofa word. The decision on the ratio of the high and low frequencycomponents is remembered for the duration of the word pulse 82.

The output of the initial consonant decision circuit goes to themajority logic pattern recognition circuitry (FIG. 5), as well as to thedecision averager 86. The decision averager 86 operates as describedabove. It takes the long term average of the spoken word and uses thisoutput as a correction. The correction may be applied as a bias 88 tothe differential amplifier comparator 78 to raise or lower its thresholdappropriately to decide upon a high or low frequency initial consonant.Changing the comparator bias 88 may not be sufficient when the eutofffrequencies in the consonant lowand high pass filters 70 and 72 are notset appropriately. The cutoff frequency may actually be changed bychanging the effective resistance in the circuit.

FIG. 8 shows details of how the cutoff frequency of the highand low-passfilter circuits 70 and 72 may be changed electronically by changing theresistance in the circuit. The capacitance also could be changed by a 90phase shift circuit operating as a capacity multiplier. Or a +90 phaseshift circuit could be utilized as an inductance multiplier circuit.Field effect transistors also may be used as controlled resistors.

A high pass filter circuit 70 with a variable resistance R is shown inFIG. 8. The capacitor C stays the same in both the idealized circuit atthe left and in the practical circuit implementation at the right, wherethe dynamic resistance of a forward biased diode 74A provides thecontrollable resistance R. Relatively small signals are fed into thiscircuit so that the diode 74A does not become nonlinear. Thus the signalmeets only the relatively constant dynamic resistance of the diode 74Awhich is controlled both by an adjusting bias 90 and by a correctionbias 88 from the decision-averager 86. When a large positive bias isapplied to the diode the forward current through the diode increases andthe dynamic resistance is low. When the positive bias is reduced towardszero, the resulting dynamic resistance is large. Care must be exercisednot to reduce the total bias to zero or the diode 74A will startoperating nonlinearly in its reverse bias region for part of the signal.

Also in FIG. 8 is a low pass circuit 72 with a variable seriesresistance R is shown. The right half shows how the diode 76A isconnected to give the effective resistance R as the forward dynamicresistance of the diode 76A. As before suitable isolation resistors areemployed to apply the summed correction and adjusting bias voltages 94to the diode. The total of the adjusting and correction bias (90 and 88)applied to the diode (74A) (76A) in either filter configurations 70 and72 must range from some small positive voltage to a large positivevoltage. Zero and negative total bias voltages are now allowed.

The bottom part of FIG. 4 shows the block diagram for the initial voweldecision circuitry 48. This circuitry is quite similar to the circuitryfor the initial consonant decision. The difference is that the memoryflip-flop samples the output of the differential amplifier comparatoronly when the vowel in the first syllable is present. Another differenceis that the RC cutoff frequency of the highand low-pass vowel filtercircuit is different than the cutoff frequency for consonants.Otherwise, the initial vowel decision circuitry operates the same as theinitial consonant circuitry. Both circuits use highand lowpass diodefilter circuits wherein the diode is controlled to vary the actualcutoff frequency as shown in FIG. 8.

The output of the initial vowel decision circuitry goes to the majoritylogic voting pattern recognition circuitry as shown In FIG. 5.

The lowand high pass RC circuits 70 and 72 may be added together inseries as desired to give a sharper cutoff frequency In this case anisolation amplifier with a high input impedance should be added betweenthe filter sections.

PHONEMIC ANALYSIS OF THE SPOKEN DIGITS 8L An idiosyncratic phonemicanalysis was done (see page 8). The two spoken digits 0 & 1" can bespoken with minor variations. The word zero" alternatively can bepronounced "Z'-r5w" or Z'e'r 'oh," and the word one" can be pronouncedWtin" or Ooh-Wuhn." The vowel sounds and consonant sounds wereconsidered separately by their relative frequencies. The vowel soundsconsidered were 1, e, 0, 00h, uh, ranked in this order of frequency fromlow to high, indicated by VI through V respectively. The consonantsounds similarly were ordered on the basis with respect to relativefrequency as follows: w, (swept upward), r, n, (slight upward sweep), z,from lowest to highest and indicated by C1 through C4 respectively. Thetime duration of the consonant and vowel sounds wasjudged to berelatively constant.

For fiJrthcr reference, see Geldard (I953)? In particular certainconsonant sounds have two or more frequency bands. As a result. theabove first approximation may have to be modified.

'Gcldard, F. A. The human senses. New York: Wiley, 1953. x, 365. pp. P.I00.

. /(C2U )V3 The spoken digit "one can be characterized as (UV5) ClV4C3/-. These two set theory notations for the spoken digits can alsobe considered to be their frequencytime signatures. It is thesefrequency and time signatures that are used in the electronicimplementation of circuits to differentiate between these two digits.

SIGNATURE RECOGNITION LOGIC The two wave forms in FIG. 6 show theapproximate duration of the set pulse and that of the first vowel pulse;these time or enable the remaining circuitry. The set pulse startswhenever energy is detected by the envelope detector for any spokenword. The duration is about 0.7 of a second but is adjustable. Wheneverthe set pulse disappears, the reset pulse is present and is used toreset the circuitry. The first vowel pulse appears whenever therelatively constant energy of a vowel is detected and is approximately0.1 seconds or sooner after the set pulse has appeared at the start of aspoken digit. This first vowel duration is estimated to last 0.1 to 0.3seconds and its pulse duration is adjustable. The first vowel pulse thusindicates the presence and duration of the first vowel in the spokendigits zero" or one. The first vowel pulse obviously is zero when thesecond vowel in 0" appears.

FIG. 5 shows the majority logic decision-making in the digitrecognition. The first operational amplifier performs the implementationof recognizing a spoken 0." The three characteristics going into therecognition are that the energy length or syllable counter had to havecounted two syllables, that the initial consonant be high frequency (z")and that the initial vowel be a low frequency one (e"). A leveladjustment is used to insure that ifthree out of these threecharacteristics or two out of the three characteristics agree, that theywill outweight the negative IO-volt input and produce a negative outputindicative of zero." If only one of the three characteristics deemednecessary for a spoken zero" is present, the -l0 volts overrides thisSingle characteristic and the output of the operational amplifier ispositive, signifying not zero." The operational amplifier output isexamined only when a delayed set pulse is present. The delay is to letthe energy length-syllable counter operate fully. Thus the memoryflipflop can be set only when a delayed set pulse is present orimmediately after a word has been spoken into the microphone. The outputis a logical zero whenever something other than zero" has been spokenand is a logical one whenever zero" has been spoken. This somewhatconfusing and arbitrary decision preserves a positive logic of 1"meaning a given signal exists. A similar situation arises with a spokenone.

The bottom part of the circuitry illustrates the implementation ofrecognizing a spoken one." The three characteristics of importance hereare that the energy length-syllable counter had to have counted just onesyllable, an initial low frequency consonant sound (w"), and an initialhigh frequency vowel sound (uh). A majority logic scheme is used, if twoout of these three, or if three out of these three characteristics arepresent, the digit is recognized as one." The logic voltage levels aresuch that two out of three or three out of the three inputs override thelevel adjust input and the output correspondingly will be negative. Ifmerely any one out of these three characteristics is present, its inputwill be insufficient to override the l0 volts coming from the leveladjust. Then the sum of the input voltages will be negative resulting ina positive output indicating not one" was present.

A recognition error occurs in the case if both the 0" and l outputs arezero. This can occur if the person said something else than a 0" or l orif the recognition circuitry had been set incorrectly. An errorrecognition light on the panel would be lit and the driver would have torepeat his digit into the microphone while the technician madeadjustments to the recognition circuitry. A combination of logic andcircuits can be used to implement the majority voting scheme instead ofthe analog method shown here.

ADAPTIVE STRESSOR The free-running oscillator (FIG. 2) is used as thepseudorandom number generator 31. It has outputs of logical zero or one,corresponding to zero volts or +10 volts. Since the maximum rate ofspeaking the digit zero or one" is 2 per second, it is assumed that theoutput of the free-running oscillator would indeed be sampled at randomtime intervals. The purpose of the sample-and-hold switch 33 is toexamine the output of the free-running oscillator 31 and then to retainthis information for the duration of one spoken digit. The sampling isdone within 0.1 milliseconds and the oscillator 31 state of +10 volts orzero volts is determined and then held for the duration of a spokendigit. A bias in the ratio of spoken zeros and ones can be introduced byadjusting the ON time of the free-running oscillator 31; if the ON timeof the oscillator 31 is greater than 50 percent, then more ones will becommanded.

The command word, the stretched output of the freerunning oscillator 31,is used to select a spoken zero or one off the endless two track tapeloop via selecting either the bottom track (0) or the top track (I). Theword selector 33A is turned on by the gating switch 33 and continuessampling one tape track until the start and stop of a spoken digit isdetected.

The word start-stop detector 338 is necessary to insure that a spokendigit is picked off the tape. The word selector 33A is started by anoutput of the sampling switch 33 and is stopped whenever the spokendigit from the tape 30 stops. The output of the word start-stop detector338 passes a spoken digit, a zero or one," to the headphone 16 for thestudent driver to listen to.

The command-spoken word comparator does just that. The student driverhas to repeat into the microphone 24 whether he thought he heard aspoken zero" or a spoken one." The digit recognition circuitry 39 has anoutput a one or zero" appropriately. The comparator circuit 32 comparesthe command word with the spoken word recognized. if the command andrecognized words agree, this is counted as a hit and a correct pulseresults. If the outputs disagree, the correct pulse is missing and anerror is recorded in the counter and on the two recorders.

The command-recognized word comparator 32 adjusts the spoken digit rateto the student driver. To allow for the student drivers having to payattention to driving and ignoring the spoken words, the word start-stopdetector 338 energizes a wait multivibrator 33C for a maximum delay ofseconds which is about the maximum permissible in recognizing a digit.if the student ignores the digit for more than 5 seconds, the rate ofpresenting further digits is slowed down. The outputs of the waitmultivibrator 33C and the commanded-word recognized-word comparator 32are summed in an AND gate 102. Ifa correct pulse occurs within 5 secondsafter speaking a digit, 3. pulse is sent to the unijunctionmultivibrator 104 whose frequency is controlled by voltage. Thefrequency of the voltage-controlled oscillator 104 increases for eachcorrect pulse and drifts downward each time an error is made. The lowerlimit is 0.2 pulses per second and the high limit is 2.0 pulses persecond corresponding to two spoken digits per second.

Thus it is seen that the device accomplishes all of its statedobjectives.

lclaim:

1. A driver simulator, comprising,

a movie film for portraying a series of driving situations projected ona screen in view of a driver station on a driver trainer unit,

an adaptive stressor unit having an audio unit which provides secondaryauditory perceptual loading sound patterns on the student in the drivertrainer unit,

an auditory pattern recognition device that includes an audio pickup toreceive the audible responses from the student in response to the soundpatterns from the stressor unit,

said auditory recognition device including an indicator for indicatingcorrect and incorrect audible responses from the student as comparedwith the sound patterns from the audio unit ofthe stressor unit,

said recognition device indicator being coupled to said stressor unit tocontrol the rate of loading on the student, and

said auditory recognition device including three adaptive circuit meansfor registering the energy and length of the spoken word, the frequencyof the initial consonants, and the frequency ofthe initial vowel.

2. The structure of claim 1 wherein said audible responses from thestudent include only the words zero and one.

3. The structure of claim 2 wherein said auditory recognition deviceincludes a word detector responsive to the sound energy exceeding apredetermined threshold level for determining when a word has startedand when it has stopped, a vowel detector for determining when the vowelin the first syl' lable of the spoken word has started and when it hasstopped by recognizing the constant frequency energy of the vowel.

4. The structure of claim 3 wherein the adaptive means for registeringthe energy and length of the spoken word includes a word length circuitadapted to be turned on in response to a word being spoken and turnedoff again when the word is finished, said word length circuit having anoutput when turned on received by an integrator-decider circuit adaptedto provide an output whenever the word length pulse has exceeded npredetermined minimum time duration, a memory means receives an outputpulse from said integrator-decider corresponding to the long word zeroand registers no pulse in response to the short word one, and the outputof the memory means is received by a majority logic decision makingcircuit.

5. The structure of claim 4 wherein the output of the memory means isreceived by a decision averager adapted to register the number of pulsesreceived from the integrator-decider, said decision averager beingenabled by said word detector such that over approximately 1 minuteperiod of time the word and integrator-decider pulses are averaged, uponthe average falling below 50 percent zeros a correction bias is appliedto the integrator such that the minimum time is lengthened or shortenedapproximately to bring the percentage of zero decisions to approximately50 percent.

6. The structure of claim 4 wherein the adaptive means for registeringthe frequency of the initial consonants includes high and low passfilters adapted to separate energy frequency bands, and a pair ofenvelope detectors are adapted to receive the output of the highandlow-pass filters which in turn feed a differential amplifier comparator,the output of the comparator being positive or negative depending onwhether a high frequency consonant or a low frequency consonant has beenspoken, a memory means connected to said comparator and said vowel andword detectors for receiving the output of the comparator during arestricted period of time established by when a spoken digit starts asdetermined by the start of the word detector pulse, and the start of thevowel in the first syllable as determined by the start of the vowelpulse whereby the restricted period is started by the word pulse andstopped by the vowel pulse and the memory means receives a signal fromthe comparator only during the initial consonant of a word, and theratio of the high and low frequency components is remembered for theduration of the word pulse, and the output of the memory means isconnected to the majority logic recognition decision circuit.

7. The structure of claim 6 wherein said adaptive means for registeringthe frequency of the initial vowel includes highand low-pass filtersadapted to separate energy frequency bands, a pair of envelope detectorsare adapted to receive the output of the highand low-pass filters whichin turn feed a differential amplifier comparator, the output of thecomparator being positive or negative depending on whether a highfrequency vowel or a low frequency vowel has been spoken, a memory meansconnected to said comparator and said vowel detector for receiving theoutput of the comparator during a restricted period of time establishedby when a vowel in the first syllable is present as determined by thevowel detector pulse being received by the memory means, and the ratioof the highand low-frequency components is remembered for the durationof the word pulse; and the output of the memory means is connected tothe majority logic recognition circuit.

8. The structure of claim 7 wherein said majority logic recognitioncircuit includes an output signal indicating a one or a zero was spokendepending on the output of energy and length adaptive circuit, thefrequency of the initial consonant adaptive circuit, and the frequencyof the initial vowel adaptive circuit compared with a predeterminedoutput pattern, the outputs of two or more of said adaptive circuitscontrolling the output of said majority logic recognition circuit.

9. The structure of claim 1 wherein a majority logic recognition circuitis connected to the outputs of said three adaptive circuit means and theoutput signal thereof indicates a one or zero was spoken depending onthe output of said three adaptive circuit means compared with apredetermined output pattern, the outputs of two or more of said threeadaptive circuit means controlling the output of said majority logicrecognition CllCUlL

1. A driver simulator, comprising, a movie film for portraying a seriesof driving situations projected on a screen in view of a driver stationon a driver trainer unit, an adaptive stressor unit having an audio unitwhich provides secondary auditory perceptual loading sound patterns onthe student in the driver trainer unit, an auditory pattern recognitiondevice that includes an audio pickup to receive the audible responsesfrom the student in response to the sound patterns from the stressorunit, said auditory recognition device including an indicator forindicating correct and incorrect audible responses from the student ascompared with the sound patterns from the audio unit of the stressorunit, said recognition device indicator being coupled to said stressorunit to control the rate of loading on the student, and said auditoryrecognition device including three adaptive circuit means forregistering the energy and length of the spoken word, the frequency ofthe initial consonants, and the frequency of the initial vowel.
 2. Thestructure of claim 1 wherein said audible responses from the studentinclude only the words zero and one.
 3. The structure of claim 2 whereinsaid auditory recognition device includes a word detector responsive tothe sound energy exceeding a predetermined threshold level fordetermining when a word has started and when it has stopped, a voweldetector for determining when the vowel in the first syllable of thespoken word has started and when it has stopped by recognizing theconstant frequency energy of the vowel.
 4. The structure of claim 3wherein the adaptive means for registering the energy and length of thespoken word includes a word length circuit adapted to be turned on inresponse to a word being spoken and turned off again when the word isfinished, said word length circuit having an output when turned onreceived by an integrator-decider circuit adapted to provide an outputwhenever the word length pulse has exceeded a predetermined minimum timeduration, a memory means receives an output pulse from saidintegrator-decider corresponding to the long word zero and registers nopulse in response to the short word one, and the output of the memorymeans is received by a majority logic decision making circuit.
 5. Thestructure of claim 4 wherein the output of the memory means is receivedby a decision averager adapted to register the number of pulses receivedfrom the integrator-decider, said decision averager being enabled bysaid word detector such that over approximately 1 minute period of timethe word and integrator-decider pulses are averaged, upon the averagefalling below 50 percent zeros a correction bias is applied to theintegrator such that the minimum time is lengthened or shortenedapproximately to bring the percentage of zero decisions to approximately50 percent.
 6. The structure of claim 4 wherein the adaptive means forregistering the frequency of the initial consonants includes high andlow pass filters adapted to separate energy frequency bands, and a pairof envelope detectors are adapted to receive the output of the high- andlow-pass filters which in turn feed a differential amplifier comparator,the output of the comparator being positive or negative depending onwhether a high frequency consonant or a low frequency consonant has beenspoken, a memory means connected to said comparator and said vowel andword detectors for receiving the output of the comparator during arestricted period of time established by when a spoken digit starts asdetermined by the start of The word detector pulse, and the start of thevowel in the first syllable as determined by the start of the vowelpulse whereby the restricted period is started by the word pulse andstopped by the vowel pulse and the memory means receives a signal fromthe comparator only during the initial consonant of a word, and theratio of the high and low frequency components is remembered for theduration of the word pulse, and the output of the memory means isconnected to the majority logic recognition decision circuit.
 7. Thestructure of claim 6 wherein said adaptive means for registering thefrequency of the initial vowel includes high- and low-pass filtersadapted to separate energy frequency bands, a pair of envelope detectorsare adapted to receive the output of the high- and low-pass filterswhich in turn feed a differential amplifier comparator, the output ofthe comparator being positive or negative depending on whether a highfrequency vowel or a low frequency vowel has been spoken, a memory meansconnected to said comparator and said vowel detector for receiving theoutput of the comparator during a restricted period of time establishedby when a vowel in the first syllable is present as determined by thevowel detector pulse being received by the memory means, and the ratioof the high- and low-frequency components is remembered for the durationof the word pulse; and the output of the memory means is connected tothe majority logic recognition circuit.
 8. The structure of claim 7wherein said majority logic recognition circuit includes an outputsignal indicating a one or a zero was spoken depending on the output ofenergy and length adaptive circuit, the frequency of the initialconsonant adaptive circuit, and the frequency of the initial voweladaptive circuit compared with a predetermined output pattern, theoutputs of two or more of said adaptive circuits controlling the outputof said majority logic recognition circuit.
 9. The structure of claim 1wherein a majority logic recognition circuit is connected to the outputsof said three adaptive circuit means and the output signal thereofindicates a one or zero was spoken depending on the output of said threeadaptive circuit means compared with a predetermined output pattern, theoutputs of two or more of said three adaptive circuit means controllingthe output of said majority logic recognition circuit.